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ABSTRACT 



A coniputer-implemented method of identifying suspect 
masses in digital radiologic images and a system for 
computer-aided diagnosis of such images in which the 
images are thresholded at a large number of threshold levels 
to discriminate spots and a two stage classifier is applied to 
the spots. The first classification stage applies multiple rules 
predetermined from a training set of images, to a relatively 
computationally inexpensive set of initial features, namely 
area, compactness, eccentricity, contrast and intensity vari- 
ance for each spot. More computationally expensive 
features, namely edge orientation distribution and texture 
features, are computed only for spots that are accepted by 
the first classification stage to points for these spots in an 
expanded feature space. In the second classification stage, 
these points are classified as true positives or false positives 
in dependence on which mean of a plurality of clusters of 
true positives and a plurality of clusters of false positives, 
predetermined from the training set is nearest In Mahalano- 
his distance. 

20 Claims, 6 Drawing Sheets 
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MASS DETECTION IN DIGITAL 
RADIOLOGIC IMAGES USING A TWO 
STAGE CLASSIFIER 

RELATED APPLICATIONS 

This application is related in subject matter to the follow- 
ing prior commonly owned application and patents by the 
same inventor as this application, which are ino m pux a ted 
herein by reference: 

1) U.S. application Ser. No. 08/699,182. filed Aug. 19. 
1996. entitled "MASS DETECTION IN DIGITAL X-RAY 
IMAGES USING MULTIPLE THRESHOLDS TO DIS- 
CRIMINATE SPOTS**, which is a continuation of U.S. 
application Sex. No. 08/274,939. filed Jul. 14, 1994 and now 
abandoned; 

2) U.S. Pat No. 5,572,565, issued Nov. 5, 1996 entitled 
"AUTOMATIC SEGMENTATION, SKINUNE AND 
NIPPLE DETECTION IN DIGITAL MAMMOGRAMS"; 

3) U.S. Pat No. 5,579360. about to issue on Nov. 26, 
1996 entitled "MASS DETECTION BY COMPUTER 
USING DIGITAL MAMMOGRAMS OF THE SAME 
BREAST TAKEN FROM DIFFERENT VIEWING 
DIRECTIONS". 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates generally to methods of and 
systems for computer-aided diagnosis of radiologic images 
Jgfel^gg^ gj^Uyoi^or arejx>nverted^ ^aigiafr^ n. 
In its more particular relpeaOiepr esent invention relates 
to identification of spots that may correspond to tumors 
using a two stage classification process in which a plurality 
of feature measures of extracted spots are calculated, and In 
a first classification stage, candidate suspect masses are 
identified based on the calculated feature measures. Then in 
a second classification stage so-called "true positives'* are 
identified from among the candidate suspect masses, thereby 
rejecting so called 'false positives**. The invention is par- 
ticularly pertinent to Computer-Aided Diagnosis of Mam- 
mogram (CADM) but is also useful with respect to 
computer-aided diagnosis of other common radiologic 
images, for example, chest X-rays. 

2. Description of the Related Ait 

A method of this general type is known from U.S. Pat No. 
5289374. 

Early detection of breast cancer, the second most common 
cancer in women in the United States, can significantly 
increase the chances of survival Such early detection 
requires the taking and reading by radiologists or mamrnog- 
raphers of a large number or periodic screening mammo- 
grams. Both the number of mammogram to be interpreted 
and the difficulty of identifying masses and clusters of 
microcalcifications therein* which are potential signs of 
malignancy, motivate developments in Computer-Aided 
Diagnosis of Mammograms (CADM) to at least mark sus- 
pect areas to aid in the reading. There are related needs with 
regard to the detection of lung nodules in a similarly large 
number of chest radiographs. 

Digital radiologic images suitable for computer-aided 
diagnosis may be obtained by scanning film taken by 
conventional X-ray equipment or by utilizing other X-ray 
detector types that produce electronic image signals that 
may be directly digitized without the necessity of producing i 
a film intermediate. These detector types include X-ray 
image intensifier/camera chain, photostimuable phosphor 



2 

plate/laser readout (see U.S. Pat. No. 4236,078), and sele- 
nium plate/electrometer readout technologies. Such tech- 
nologies are progressing in their spatial resolution and 
contrast sensitivities achieved and the latter two. 
5 particularly, may soon find widespread use for mammo- 
graphic and chest radiographic applications. 

In the prior art such as the cited U.S. PaL No. 5289374. 
it is known to calculate a variety of feature measures of 
extracted spots relating to size, shape, edge gradient and/or 
to contrast to perform a classification based on previously 
acquired knowledge as to ranges or thresholds for such 
feature measures or combinations thereof typically associ- 
ated with candidate suspect masses and/or true positives. 
In addition to the aforementioned, from Chan et aL, 
15 "Computer-aided classification of mammographic masses 
and normal tissue; linear discriminant analysis in texture 
feature space", Phys. Med. Biol. 40 (1995) 857-876, it in 
known to use a plurality of texture features to distinguish 
between masses and normal breast parenchyma in a data set 
20 of square regions of the same fixed size in digital mammo- 
grams. 

While much research is ongoing in CADM. the combi- 
nation of an acceptably high rate of detection of actual true 
positives, and an acceptably low number of false positive 
detections per image, has proved illusive. 

It is an object of the present invention to provide a method 
of and system for computer-aided detection of suspect 
masses in radiologic images which has a high detection rate 

^ of actual true positives and a low number of detections of 
false positives per image. It is further desired in order to 
provide such results in a sufficiently short processing time 
that the detection method have at least two stages of clas- 
sification such that the calculation of those feature measures 

J5 which are computationally expensive to calculate is deferred 
until after a first coarse stage of classification in which 
candidate suspect masses are identified using an initial set of 
feature measures, corresponding to an initial multi- 
dimensional feature space. Then in a second or final stage of 

iq classification, further computationally expensive feature 
measures are calculated only with regard to the determined 
candidate suspect masses to identify the true positives 
therein in an augmented or expanded feature space having 
more dimensions than the initial feature space. 

15 After a radiologic image to be read has been obtained 
from an imaging apparatus, and if necessary, converted into 
digital form, an overall region of interest is identified by an 
automatic segmentation specific to the type of radiologic 
image e.g. to extract the breast from background in a 

50 mammogram. Then spots or "connected components" are 
extracted by thresholding the image at each gray level in a 
relatively large range of gray levels (in excess of 20 and 
typically approximately 50 consecutive gray levels) which 
are determined from a histogram of gray levels. Each spot 

55 discriminated at a gray level in the range is extracted. The 
large number of gray levels is used in order that even a spot 
which is discriminated as an island at only one threshold 
level within the range of gray levels will be extracted. 
Prior to entering upon the first stage of classification, the 

60 initial set of feature measures, namely, area, compactness, 
eccentricity, contrast and intensity variance, is calculated for 
each extracted spot. The initial set of feature measures is 
used by the first stage of classification to identify candidate 
suspect masses. 

55 Those candidate suspect masses which have been identi- 
fied or accepted in the first classification stage are applied to 
a second classification stage wherein each candidate suspect 
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mass is classified as either a true positive or a false positive. 
Prior to doing this, the initial set of feature measures is 
augmented by more computationally expensive feature 
measures, in particular edge orientation variance and 
so-called Laws texture features, to form an expanded set of 
feature measures. 

In accordance with the present invention, the initial stage 
of classification is multiple-rule-based which rules are 
devised in a training phase from a training set of radiographs 
in which true positives or roi's nave been marked by a 
radiologist or mammographet An important aspect of the 
present invention is that the final stage of classification is 
based on comparison of the location in expanded feature 
space of the vector of expanded feature measures of a 
candidate spot with locations developed from mapping both 
"true positives" and "false positives" in expanded feature 
space in a (mining phase for the second stage of classifica- 
tion. 

Key aspects of the present invention result from a recog- 
nition that suspect masses correspond to points in feature 
space which are distributed in a plurality of clusters or 
subsets. 

With respect to the multiple-rule-based first classification 
stage, in the training phase an individual rule is devised for 
each cluster and in the operational phase an extracted spot is 
accepted as a candidate suspect mass if it passes any 
individual rule. 

The second classification stage is trained in a training 
phase in which ail spots from all images of the training set 
which pass the first stage of classification are segmented 
based on the markings of the radiologist into true positives 
and false positives. The true positives are clustered into a 
plurality of clusters in expanded feature space and the false 
positives are separately clustered into a plurality of clusters 
in expanded feature space. Hie means and co variance matri- 
ces for each of the clusters are then detenniiied. 

In the operational phase of the second classification stage, 
a candidate suspect mass is classified in dependence upon 
which mean of a cluster is nearest in Mahalanobis distance. 
If the nearest cluster is a true positive, then the candidate 
suspect mass is classified as a true positive, and is accepted. 
Otherwise, the candidate suspect mass is classified as a false 
positive and is rejected. 

The present invention provides an extremely robust 
computer-aided diagnosis method which testing on mam- 
mograms has indicated a detection rate in excess of 90% and 
less than 3 false positive detections per image, on average. 

BRIEF DESCRIPTION OF THE DRAWING 

Other objects, features and advantages of the present 
invention will become apparent upon perusal of the follow- 
ing detailed description when taken in conjunction with the 
appended drawing, wherein; 

FIG. 1 is a schematic diagram of a computer-aided system 
in accordance with the invention for taking and processing 
mammograms; 

FIG. 2 is a flow chart indicating the processing performed 
by a computer in FIG. 1 in an operational phase; 

FIG. 3 is a histogram of the gray values of the pixels of 
a mammogram used to determine an interval of gray levels 
in a step of FIG. 2; 

FIG. 4 shows a connected component extracted in a step 
of FIG. 2; 

FIGS. 5A and 5B are flow charts indicating the training of 
first and second classification stages of FIG. 2. respectively; 
and 
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FIGS. 6A through 6E show so-called Laws texture kernels 
used in calculating feature measures for the second classi- 
fication stage of FIG. 2. 

5 DETAILED DESCRIPTION OF THE 

PREFERRED EMBODIMENTS 

Referring first to FIG. 1 of the drawing, there is shown a 
computer-aided mammography system 10. with its mam- 
mogram taking parts arranged for a cranio-caudal (CQ 

10 view, including an X-ray source 12 directed to irradiate a 
breast 14 of a standing subject with an X-ray beam 15. The 
breast 14 is received and compressed between generally 
planar lower and upper members 16. 18. using a predeter- 
mined compression force or weight. Below lower member 

15 16 is a two-dimensional X-ray detector means 20 for detect- 
ing within a rectangular field of pixels, the X-ray radiation 
passing through breast 14 and its immediate external sur- 
round. X-ray detector means 20 is alternatively a film or a 
photostimuable phosphor image plate received in a holder. 

20 or a selenium plate/electrometer readout detector. An X-ray 
image intensifier/camera chain is also a suitable detector 
means. Hie X-ray source 12. plates 14 and 16 and detector 
means 20 may be rotated an a unit about transverse axis A 
to receive and irradiate breast 14 along any of the viewing 

25 directions labeled in FIG. 1 as CC (cranio-caudal). LM or 
ML (later o-medial or medial-lateral) and OB (oblique). 

Whichever detector means 20 type is used, ultimately 
there is a two-dimensional array of digital pixels, represent- 

3Q ing the mammogram X-ray projection image, stored as an 
image file in a digital storage device 22 which may comprise 
a RAM. hard disk, magneto-optical disk, WORM drive, or 
other digital storage means. When film is used, it is devel- 
oped and then scanned in a digitizer 24. Today, films may be 
digitized to 100 micron spatial resolution, yielding typical 
images ranging in size from 1672x2380 to 2344x3016 
pixels, each up to 12 bit intensity resolution. When a 
photostituabte plate is used, it is scanned by a laser in 
scanner 26 yielding a similar image size and typically 10 bit 

^ intensity resolution. Lastly, when a detector such as a 
selenium plate/electrometer readout device is utilized, it 
directly produces analog electrical signals that are converted 
to digital form by its analog to digital converter 28. 
The two-dimensional array of digital pixels stored in 

45 device 22. representing the mammogram, is processed by 
computer workstation 3ft to mark or enhance features of 
interest in the mammogram, including any identified suspect 
masses or clusters of mi crocalcifi cations, and display the 
resultant processed mammogram on display device 32. such 
as a CRT monitor. It should be understood that the various 
steps after the actual taking of the mammogram need not 
necessarily follow immediately thereafter and need not be at 
the same location as the taking of the mammogram. 
As a preliminary step in the processing, the stored mam- 

55 mogram may be reduced in resolution, spatially by a suitable 
median filter, and/or in amplitude by truncation, to an image 
on the order of 500.000 to 2.500.000 pixels and 8 bit to 
10-bit intensity resolution consistent with the spatial and 
gray scale resolution of the monitor. In particular. I have 

6a found that square pixels of 400 microns on a side and 256 
gray levels per pixel give acceptable results. 

In the processing to mark or enhance features, the mam- 
mogram is segmented into foreground, corresponding to the 
breast, and background, corresponding to the external sur- 

65 round of the breast and the skinline is detected in the course 
of this segmentation. The segmentation allows background 
to be Himinatrri from the search for features of interest, such 
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as masses or clusters of micr ocakificalions. to be marked or computed. The mask image is essentially the thresholded 

enhanced The segmentation may be performed by the image in which all non-icro pixels contain the number of the 

method described in the aforementioned commonly owned CC to which they belong. The additional data objects 

U.S. Pat No. 5.572,565. The identification of suspect clus- include an array defining a bounding box B (minimum and 

ters of rnicrocalcili cations Is described in U.S. Pat No. 5 maximum column and row) for each CC. A CC and its 

5 J 65 .429 entitled "Computer Detection of Microcalcifica- bounding box B are shown in FIG. 4. 

tions in Mammograms' 1 , which is also assigned to the same After the extraction of connected components in step 40, 

assignee as the present invention. an initial set of five relatively computationally inexpensive 

Now referring to the flowchart shown in FIG. 2. the feature measures is calculated individually for each CC in 

identification of suspect masses in a two-dimensional mam- 1Q step 42. Of this initial set preferably, the variance measure 

mogram projection image will be described. It is assumed Var of the intensities of the pixels in each CC is calculated 

that as referred to heretofore, the original mammogram has first in accordance with the following equation: 
been reduced in spatial resolution to about 250.000 pixels 

(e.g. 450x520) to form input digital radiologic image 34. _ J_ n 

Then, in step 36. segmentation is performed by skinline Yar ~ n ^ to " Mr 

detection so that each pixel in the background has been 15 
removed from further consideration, a histogram ofthe gray 

values or the pixels in the ; f oreground^o^ed. and a « &J» 1 ^ 

relevant interval of erav levels for thresholding is deter- . „ , F . , . 

th h t 8 number. Thereafter the connected components are smoothed 

mineo rxom tne nanogram. 20 by erosion and then dilation. Preferably a structuring ele- 

A typiart histogram is shown in FIG. 3 and is seen to be meM b , 3 ^ 3 mmix rf on „ b Qsed fof 

subdi>adable.ntoaninterval-a from the anaHest gray level ^ fte ^ a ^ ^ irregularities in die 

S in die histogram to a gray level G. which urtava^ corre- boliaililics of me ^ccud regions are smoothed and small 

spends to the skin, and a narrower interval "b" from level 0 volds lo ^ ^ Me lThen the other four feature 

to the largest gray level L in the histogram, which interval measures „ calculaIed for each smoothed CC. namely area 

conespond to the interior of the breast Intend a has a (Area), compactness (Compact), contrast (Contrast) and 

substantially low number of pixels at each gray level while eccentricity fEcc) 

interval « V has a relatively high peak with steep sides. The ^ mnlpactIless measure Compact is computed for each 

interval d is chosen as the relevant interval of gray levels qq ^ follows* 

for thresholding. Gray level G is chosen such that interval ^ 

**b" twice the interval "c" between gray level L and the gray ^ 

level P at the peak of the histogram. Compact 

In accordance with the invention, each gray level in 

interval "b" is used as a threshold. Typically, in a 256 gray wn ere p i s the perimeter and A is the area (Area) of the CC. 

level image interval «b w contains at least 20 twenty gray 3J mcasurc is miiiimum for a circle, 

levels, and often more man 50. In step 38, conveniently. The contrast measure Contrast is calculated by subtracting 

these gray levels are successively used as a current threshold mc avcragc gray value in a ring outside the connected 

level in either smallest to largest, or largest to smallest, order co^nc^ ft^m the average gray value inside the connected 

to threshold the image. At each current threshold level, a component Hie ring is obtained by dilating the connected 

binary image, or a gray scale image whose pixels having an ^ components, then keeping only the new pixels, 

intensity less than the threshold level are assigned the value ^ eccentricity measure Ecc is calculated as follows: 
zero. 

In step 40, spots referred to as "connected components** r_ 

(CCs) are extracted from the thresholded image. Each ^ TZT 

connected component is a set of pixels having non-zero 45 

values, in which any two pixels ofthe set are ultimately where, as shown in FIG. 4, r m and r^ are the maximum 

connected to each other via a run of adjacent pixels in the ^dniinimiim distances between the center of bounding box 

set These sets are identified conveniently by the following b and the perimeter of the CC. 

phases: a) generating a Line-Adjacency Graph (LAG), b) the calculated initial set of feature measures is 

scanning the LAG to <kterminc the number of different ^ to a multi ple-rule based first ^classification staged" 

connected components (CCs), and c) again scanning the to identify candi date suspect masses fr om among theCC's. 

LAG to create a mask image and several summary arrays ^ operational phase of this stage is best understood by fist 

that define and describe each CC. discussing its training phase. 

The method to create an LAG in phase a) above is based . The training phase for first classification stage 44 as 

on the description in the book "Algorithms for Graphics and 55 shown in FIG. 4A utilizes a set of (raining images in which 

Image Processing" by Pavlidis. Computer Science Press, regions of interest (roi's) corresponding to actual true posi- 

1982, pp. 116-120. It consists of, for each line of the tives have been marked by a radiologist Satisfactory results 

thresholded image, finding runs of adjacent non-zero valued have been obtained with a training set of 43 images, 

r^els. comparing me rwsinon of ^ mc first step 54 of this training phase, images are 

prior adjacent line, and recording any overlap. 60 interactively segmented to extract areas (CCs) that match 

Although the LAG specifies which lines overlap, it does the radiologist's markings. This step involves interactive 

not define a connected component. Thus in phase b). each choice of a range of threshold levels which discriminate the 

record of overlapping runs is scanned to determine to which marked CCs. Then in step 56. the initial met of five feature 

CC each run belongs. Along the way, the total number of measures (Var. Area, Compact Contrast and Ecc) is calcu- 

connected components is computed. 63 lated for each extracted CC in the same manner as in step 42 

Once the set of CCs is known, then in phase c) a mask of FIG. 2. The initial set of feature measures for each CC 

image and several data objects to define each CC are may be viewed as a point for each CC mapped in a 
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Five- dimensional initial feature space. Then, in step 58. a 
K-means clustering is performed to cluster these CC's in 
initial feature space into k clusters, where k is empirically 
determined such that no cluster is contains only a very few 
points. The K-means algorithm in well known, e.g. from the 
book Hartigan. "Clustering Algorithm". John Wiley & Sons. 
1975. Chapter 4. Each hyper-rectangle that encloses a clus- 
ter is used to devise a separate rule of the form: 

(Area^Area moJC ) AND (Area^Area^J AND 
( Compact ^ Compact^ AND (Contrast i Contrast^ 
AND (Ecc^Bcc^ AND (Var^Var^. 

These rules are used in the first classification phase 44 of 
FIG. 2, such that a connected component is accepted as a 
candidate suspect mass if its initial set of feature measures 
satisfies any of the k rules. 

A typical set of rules for k=8 c lusters uses the limit values 
summarized in the following table: 



k 


Area 




Compacts 


Contrast— 






1 


900 


none 


20 


4 


2.3 


50 


2 


2500 


OGDC 


25 


4 


2.8 


35 


3 


650 


350 


20 


12 


3.8 


50 


4 


650 


DCB£ 


20 


2 


3.8 


50 


5 


2300 


260 


27 


7 


3.6 


10 


6 


1500 


600 


25 


10 


2.6 


125 


7 


650 


450 


27 


9 


5.2 


20 


8 


DO DC 


2000 


27 


3 


19 


50 



In step 46 of FIG. Z five more computationally expensive 
further feature measures are calculated, namely edge orien- 
tation and four Laws texture features to augment the initial 
set of five feature measures to an expanded set of ten feature 
measures. The edge orientation feature was generally sug- 
gested in Kegelmeyer. Evaluation of Stellate Lesion detec- 
tion in a Standard Mammogram Data Set". International 
Journal of Pattern Recognition and Artificial Intelligence. 
Vol. 7. no. 6. pp. 1477-1492. 1993. based on modeling the 
architecture distortion caused by stellate lesions, where the 
tumor is surrounded by spicules. In the region around the 
tumor, edges will have many different orientations. Because 
a normal mammogram has a duct structure which radiates 
from the nipple, normal areas will have edges with similar 
orientations. This feature measure Is calculated by comput- 
ing edge orientation at each pixel in the bounding box B for 
the candidate suspect mass. Then the histogram of the edge 
orientations is computed and its flatness is measured by 
computing standard deviation of the edge orientation distri- 
bution. This manner of calculation differs from Kegelmeyer. 
where a window of fixed size is centered about each pixel in 
the image and edge orientation computed for pixels in the 
window. The present invention differs because its window 
size changes according to the size of the candidate suspect 
mass. Hence, the window will always contain the entire 
candidate suspect mass and will give a much more relevant 
measure. 

The Laws texture features are calculated by convolving 
the image with a set of four 5x5 kernels designed to respond 
to different local behaviors, followed by measuring various 
statistics on the convolution images. These kernels are 
chosen as suggested in KJ. Laws. 'Textured Image 
Segmentation". Ph.D. thesis. University of Southern 
California. 1980. wherein they are named E5L5. R5R5, 
E5S5. L5S5, (which are shown in FIGS. 6A-6D. 
respectively) followed by computation of the local average 
of absolute values. Preferably, according to the method of 
the present invention, these measures are normalized by the 
kernel L5LS. shown in FIG. 6E and the normalized sum is 



computed solely for the pixels in the bounding box B for the 
candidate suspect mass. 

The expande d set of ten featur e measures is applied to the 
second stage oi classification in step 43 to separate the 
5 candidate suspect masses into true positives a nd false posi- 
t ives based on their proximity in feat ure space t o tru e 
positives and fa lse pos itives map ped in a training phase 
shbwn"ig"FIGrSB . 

"""Since manses nave different types with different charac- 
10 teristics and false positives can arise due to different reasons, 
it is expected that a single cluster in expanded feature space 
containing all the true positives of the training sample will 
tend to overlap a single cluster containing all the false 
positives of the training sample. Hence determining prox- 
15 unity of a point in ten-dimensional expanded feature space 
corresponding to a candidate suspect mass to the mean of the 
single cluster of true positives and to the mean of the single 
cluster of false positives would not be useful. On the other 
extreme, treating each of the true positives and each of the 
20 false positives mapped in the training phase as a separate 
class, as by a KNN (k-nearest neighbor) method would be 
quite time consuming because it would require the calcula- 
tion and sorting of the distances to all points mapped from 
the training sample. 

In accordance with the present invention, the true posi- 
tives and false positives obtained as indicated in the suc- 
ceeding paragraphs, are each separately clustered into an 
empirically determined plurality of clusters or subclasses. 
This reduces their overlap, and hence allows for better 
separation between true positives and false positives. 

During, the training phase for the second stage of 
classification, the steps 36 through 44 of FIG. 2 are applied 
to the entire training set of images, including as indicated in 
step 60 of FIG. 5B. the extraction of regions which satisfy 
any of the rules developed in the training phase for the first 
classification phase, shown in FIG. 5A. Then in step 62. as 
in step 46. the texture and edge orientation features are 
calculated to form an expanded set of feature measures for 
each extracted region. The resultant points in expanded 
40 feature space are segmented into true positives (TP) and 
false positives (FP) based on the radiologist's markings and 
they are separately clustered in steps 63 and 64 into a 
plurality of clusters. As with the clustering performed with 
regard to the training phase for the first classification stage 
45 shown in FIG. 5 A. the K-means algorithm is used (although 
other clustering approaches are possible, and the number of 
clusters is chosen empirically so that no cluster contains 
only a few points. 
Then, for each of the plurality of clusters, the mean and 
50 covariance matrix is calculated The covariancc matrix of a 
cluster is defined as: 

5J where the expected value of a matrix is found by taking the 
expected values of its components. Here. X is a ten (or in 
general, d) component column vector of data values and 
— * 

u is a ten (in general, d) component column vector of mean 
values. 

60 In the operational phase of the second classification stage 
48 of FIG. 2, the Mahalanobis distance from a point in 
expanded feature space corresponding to a candidate suspect 
mass is measured to the mean of each cluster of true 
positives of the training set and to the mean of each cluster 

65 of false positives of the training set and the candidate suspect 
mass is classified as the same class as the cluster whose 
mean is nearest in Mahalanobis distance. That is. if the mean 



25 



30 



35 
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of a cluster of true positives is nearest, the candidate suspect 
mass is classified as a true positive, whereas if the mean of 
a cluster of false positives is nearest it is classified as a false 
positive. Mahalanobis distance r A from the i" cluster is 
defined as: 5 



where Ej" 1 is the inverse of the covariance matrix for the ith 10 

cluster. X is the feature vector in expanded feature space for 

— *■ . 
the candidate suspect mass, and u, is the mean of the i 
duster. 

Lastly, in step 50. pixels belonging to a detected true 
positive are assigned the value binary one, and in order to 15 
reconcile duplicate detections of the same masses at differ- 
ent threshold levels, a binary mask is formed am the union 
of these binary pixels. The pixels In said mask having the 
value binary one belong to suspect masses 52 which are to 
be highlighted in the display 32. 20 

The algorithm presented here was trained on a training set 
of 43 images which had been marked by a radiologist 
Thereafter, the operational phase was tested on a test set of 
81 images which also had been marked. Excluding 7 hard 
cases, the algorithm detected all the malignant masses in the 25 
test set with an average of 2.8 false positives detected per 
image. Since some images were different views of the same 
breast (cranio-caudal and oblique), the true positive detec- 
tion rate was in excess of 90% when a true positive detection 
was counted if it occurred in either view. ^ 

While the present invention has been described in par- 
ticular detail, it should be appreciated that numerous modi- 
fications are possible within the Intended spirit and scope of 
the invention, which is defined in the appended claims. 

What is claimed is: 35 

1. A computer-implemented method of identifying sus- 
pect masses in a stored input two-dimensional array of 
digital pixels derived from an output of a radiologic imaging 
apparatus comprising: 

a) thresholding at least a portion of the pixels of the stored 40 
input array to discriminate spots; 

b) fox each spot discriminated, in a first classification 
stage identifying whether said spot is a candidate 
suspect mass by: 

i) computing an initial set of feature measures of said 45 
spot including measures of shape and size, consti- 
tuting a location in an initial feature space, and 

ii) determining whether the computed initial set satis- 
fies predetermined criteria; and 

c) for each spot identified as a candidate suspect mass, in so 
a second classification stage deterrnining whether said 
spot is a true positive suspect mass or a false positive 
suspect mass by: 

i) computing further feature measures to form an 
expanded set of feature measures constituting a 55 
location in an expanded feature space, and 

ii) comparing said location with predetermined loca- 
tions in expanded feature space corresponding to true 
positives and corresponding to false positives by 
finding which predetermined location is nearest to 60 
the location of the spot, using a distance measure. 

2. A method as claimed in claim 1. wherein said prede- 
termined locations in expanded feature apace corresponding 
to true positives and to false positives comprise a plurality 

of predetermined locations corresponding to true positives 65 
and a plurality of predetennined locations corresponding to 
false positives. 
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3. A method as claimed in claim 1. wherein said prede- 
termined criteria comprise a plurality of rules organized 
such that a spot is identified as a candidate suspect mass if 
its set of feature measures satisfies any of the plurality of 
rules. 

4. A method as claimed in claim 2 wherein said prede- 
termined criteria comprise a plurality of rules organized 
such that a spot is identified as a candidate suspect mass if 
its set of feature measures satisfies any of the plurality of 
rules. 

5. A method as claimed in claim 2. wherein each location 
of said plurality of predetermined locations of true positives 
is determined from a different cluster of a plurality of 
clusters of true positives developed in a training phase for 
said first classification stage and each location of said 
plurality of predetennined locations of false positives is 
determined from a different cluster of a plurality of clusters 
of false positives developed in said training phase. 

6. A method as claimed in claim 4, wherein each location 
of said plurality of predetermined locations of true positives 
is determined from a different cluster of a plurality of 
clusters of true positives developed in a training phase for 
said first classification stage and each location of said 
plurality of predetermined locations of false positives is 
determined from a different cluster of a plurality of clusters 
of false positives developed in said training phase. 

7. A method as claimed in claim 3. wherein each rule of 
said plurality of rules is devised from a different cluster of 
a plurality of clusters of suspect masses in initial feature 
space developed in a training phase for said first classifica- 
tion stage. 

8. A method as claimed in claim 4. wherein each rule of 
said plurality of rules is devised from a different cluster of 
a plurality of clusters of suspect masses in initial feature 
space developed in a training phase for said first classifica- 
tion stage. 

9. A method as claimed in claim 6. wherein each rule of 
said plurality of rules is devised from a different cluster of 
a plurality of clusters of suspect masses in initial feature 
space developed in a training phase for said first classifica- 
tion stage. 

10. A method as claimed in claim 1. wherein said further 
feature measures of a spot include a plurality of texture 
feature measures computed based solely on intensities of 
pixels within a bounding box, each of whose sides adjoin an 
edge of the spot 

11. A method as claimed in claim 2. wherein said further 
feature measures of a spot include a plurality of texture 
feature measures computed based solely on intensities of 
pixels within a bounding box. each of whose sides adjoin an 
edge of the spot 

12. A method as claimed in claim 3. wherein said further 
feature measures of a spot include a plurality of texture 
feature measures computed based solely on intensities of 
pixels within a bounding box, each of whose sides adjoin an 
edge of the spot. 

13. A method as claimed in claim 1. wherein said further 
feature measures include a measure of edge gradient orien- 
tation distribution. 

14 A method as claimed in claim 2. wherein said further 
feature measures include a measure of edge gradient orien- 
tation distribution. 

15. A method as claimed in claim 3. wherein said further 
feature measures include a measure of edge gradient orien- 
tatioD distribution. 

16. A method as claimed in claim 10. wherein said further 
feature measures include a measure of edge gradient orien- 
tation distribution. 
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17. A method an claimed in claim 1, wherein said thresh- 
olding is at. at least 20 different threshold levels. 

18. A method as claimed in claim 2. wherein said thresh- 
olding is at, at least. 20 different threshold levels. 

19. A method as claimed in claim 1, wherein said distance s 
measure is Mahalanobis distance. 

20. A system for producing a computer-enhanced radio- 
logic image comprising: 

means including an X-ray source, for irradiating a region 

of a body being examined with X-ray radiation in a 10 

predetermined viewing directions; 
means for receiving the X-ray radiation exiting the region 

from said viewing direction within a two-dimensional 

field; 

means for producing digital signals as a function of the 
X-ray radiation received, which digital signals corre- 
spond to an input two-dimensional array of digital 
pixels; 

a computer; 20 
a digital memory means accessible to said computer; 
means for. in response to said signals, storing said input 

two-dimensional array of digital pixels in said digital 

memory means; 
wherein said computer is configured for processing the 23 

stored input two-dimensional array or digital pixels by: 
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a) thresholding at least a portion of the pixels of the stored 
input array to discriminate spots; 

b) for each spot discriminated, in a first classification 
stage identifying whether said spot is a candidate 
suspect mass by: 

i) computing an initial set of feature measures of said 
spot including measures of shape and size, consti- 
tuting a location in an initial feature space, and 

ii) determining whether the computed initial set satis- 
fies predetermined criteria; and 

c) for each spot identified as a candidate suspect mass, in 
a second classification stage determining whether said 
spot is a true positive suspect mass or a false positive 
suspect mass by; 

i) computing further feature measures to form an 
expanded set of feature measures constituting a 
location in an expanded feature space, and 

ii) comparing said location with predetermined loca- 
tions in expanded feature space corresponding to true 
positives and corresponding to false positives by 
finding which predetermined location is nearest to 
the location of the spot using a distance measure. 
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